Goto

Collaborating Authors

 functional domain


Adaptive LASSO estimation for functional hidden dynamic geostatistical model

arXiv.org Machine Learning

We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hidden dynamic geostatistical models (f-HDGM). These models employ a classic mixed-effect regression structure with embedded spatiotemporal dynamics to model georeferenced data observed in a functional domain. Thus, the parameters of interest are functions across this domain. The algorithm simultaneously selects the relevant spline basis functions and regressors that are used to model the fixed-effects relationship between the response variable and the covariates. In this way, it automatically shrinks to zero irrelevant parts of the functional coefficients or the entire effect of irrelevant regressors. The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (LASSO) penalty function, wherein the weights are obtained by the unpenalised f-HDGM maximum-likelihood estimators. The computational burden of maximisation is drastically reduced by a local quadratic approximation of the likelihood. Through a Monte Carlo simulation study, we analysed the performance of the algorithm under different scenarios, including strong correlations among the regressors. We showed that the penalised estimator outperformed the unpenalised estimator in all the cases we considered. We applied the algorithm to a real case study in which the recording of the hourly nitrogen dioxide concentrations in the Lombardy region in Italy was modelled as a functional process with several weather and land cover covariates.


A Transformer-based Neural Language Model that Synthesizes Brain Activation Maps from Free-Form Text Queries

arXiv.org Artificial Intelligence

Neuroimaging studies are often limited by the number of subjects and cognitive processes that can be feasibly interrogated. However, a rapidly growing number of neuroscientific studies have collectively accumulated an extensive wealth of results. Digesting this growing literature and obtaining novel insights remains to be a major challenge, since existing meta-analytic tools are constrained to keyword queries. In this paper, we present Text2Brain, an easy to use tool for synthesizing brain activation maps from open-ended text queries. Text2Brain was built on a transformer-based neural network language model and a coordinate-based meta-analysis of neuroimaging studies. Text2Brain combines a transformer-based text encoder and a 3D image generator, and was trained on variable-length text snippets and their corresponding activation maps sampled from 13,000 published studies. In our experiments, we demonstrate that Text2Brain can synthesize meaningful neural activation patterns from various free-form textual descriptions. Text2Brain is available at https://braininterpreter.com as a web-based tool for efficiently searching through the vast neuroimaging literature and generating new hypotheses.


Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with Experimental Data

arXiv.org Artificial Intelligence

HIV-1 Rev is one of several clinically important proteins that are "experimentally recalcitrant," i.e., for which it has not been possible to obtain high resolution structural in formation. Identifying critic al functional residues in Rev is further complicated by the fact that Rev proteins have no significant sequence similarity to any protein with known structure, and that Rev sequences from different species have very little similarity to one another. Our comparison of predictions with experimental data on the Rev proteins from HIV-1 and EIAV demonstrates that sequence-based computational methods can identify residues in "recalcitrant" proteins that interact with other proteins or nucleic acids. When structural information is available for a protein of interest, enhanced prediction accuracy can be achieved (18, 29). Developing improved methods for predicting binding sites will contribute to our understanding of how proteins recognize their targets in cells and may significantly decrease the time needed to precisely map binding sites in the laboratory. The level of accuracy obtained using the sequence-based methods presented here suggests that they could expedite the design of experiments to explore the function of key regulatory proteins, even when no structural information is available, with obvious implications for developing new therapies for both genetic and infectious diseases. Acknowledgments This Research was supported in part by grants NIH, GM 066387 (VH, DD, & RLJ) and CA97936 (SC), by an ISU Center for Integrated Animal Genomics grant (DD, VH & RLJ), and by USDA Formula Funds (SC & DD). We thank Sijun Liu for technical assistance and Jeffrey Sander for useful comments.


Emergence of Ultra-Conserved Protein Domains and Amino Acid Repeats: Adaptation, Competition and Thresholds

AAAI Conferences

Some proteins, such as homeodomain transcription factors, contain highly conserved regions of sequence that cannot be attributed to the constrains imposed by any single function. It has recently been suggested that multiple conserved functional domains overlap and together explain the high conservation of these regions. However, because these highly conserved domains are part of much larger proteins, we are still left with the question why so many functional domains cluster together. Here we have modeled an evolutionary mechanism that can produce this kind of clustering. Due to adaptive competition between different protein functions for control over amino acid residue identity, conserved functional domains get displaced from regions undergoing adaptive evolution. At first they undergo a steady random walk within the sequence for an indefinite amount of time; however, a threshold is reached when two functional domains happen to come into contact, at which point there is a dramatic shift in the adaptive dynamics such that the domains rapidly converge, lengthen, and evolve overlap — stabilizing at a fully overlapped state. We also studied the evolution of single amino acid tandem repeats (a.k.a. homopeptides), which are especially prevalent in transcription factors. Homopeptides that are encoded by nonhomogenous mixtures of synonymous codons cannot be explained by the neutral process of replication slippage. Our model provides two ways to explain the origin and maintenance of such repeats, and their over-representation in highly conserved proteins: competition between multiple functional domains for space within a sequence, or reuse of a sequence for many functions over time. Both processes depend on reaching certain critical thresholds, however they both deterministically cause the evolution of repeats once these thresholds are reached. Further, both of these processes are characteristic of multi-functional proteins such as homeodomain transcription factors. We conclude that our model can explain two widely recognized features of transcription factor proteins: conserved domains and a tendency to accumulate homopeptides.